A new strategy of outlier detection for QSAR/QSPR

نویسندگان

  • Dong-Sheng Cao
  • Yi-Zeng Liang
  • Qing-Song Xu
  • Hong-Dong Li
  • Xian Chen
چکیده

The crucial step of building a high performance QSAR/QSPR model is the detection of outliers in the model. Detecting outliers in a multivariate point cloud is not trivial, especially when several outliers coexist in the model. The classical identification methods do not always identify them, because they are based on the sample mean and covariance matrix influenced by the outliers. Moreover, existing methods only lay stress on some type of outliers but not all the outliers. To avoid these problems and detect all kinds of outliers simultaneously, we provide a new strategy based on Monte-Carlo cross-validation, which was termed as the MC method. The MC method inherently provides a feasible way to detect different kinds of outliers by establishment of many cross-predictive models. With the help of the distribution of predictive residuals such obtained, it seems to be able to reduce the risk caused by the masking effect. In addition, a new display is proposed, in which the absolute values of mean value of predictive residuals are plotted versus standard deviations of predictive residuals. The plot divides the data into normal samples, y direction outliers and X direction outliers. Several examples are used to demonstrate the detection ability of MC method through the comparison of different diagnostic methods.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Toward better QSAR/QSPR modeling: simultaneous outlier detection and variable selection using distribution of model features

Building a robust and reliable QSAR/QSPR model should greatly consider two aspects: selecting the optimal variable subset from a large pool of molecular descriptors and detecting outliers from a pool of samples. The two problems have the specific similarity and complementarity to some extent. Given a particular learning algorithm on a particular data set, one should consider how the interaction...

متن کامل

A novel topological descriptor based on the expanded wiener index: Applications to QSPR/QSAR studies

In this paper, a novel topological index, named M-index, is introduced based on expanded form of the Wiener matrix. For constructing this index the atomic characteristics and the interaction of the vertices in a molecule are taken into account. The usefulness of the M-index is demonstrated by several QSPR/QSAR models for different physico-chemical properties and biological activities of a large...

متن کامل

QSPR Analysis with Curvilinear Regression Modeling and Topological Indices

Topological indices are the real number of a molecular structure obtained via molecular graph G. Topological indices are used for QSPR, QSAR and structural design in chemistry, nanotechnology, and pharmacology. Moreover, physicochemical properties such as the boiling point, the enthalpy of vaporization, and stability can be estimated by QSAR/QSPR models. In this study, the QSPR (Quantitative St...

متن کامل

7 . Orthogonalization methods in QSPR - QSAR Studies

We discuss some features of the orthogonalization methods commonly applied to QSPR QSAR studies. We outline the well known multivariable linear regression analysis in vector form in order to compare mainly Randic and Gram-Schmidt orthogonalization procedures and also cast the basis for other approaches like Löwdin’s one. We expect that present review may become the starting point for future dev...

متن کامل

QSPR designer – employ your own descriptors in the automated QSAR modeling process

The prediction of physical and chemical properties of molecules is a very important step in the drug discovery process. QSAR and QSPR models are strong tools for predicting these properties. The models employ descriptors and statistical approaches to provide an estimation of the desired property. An abundance of descriptors and QSAR/QSPR models were published, but the prediction of some propert...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Journal of computational chemistry

دوره 31 3  شماره 

صفحات  -

تاریخ انتشار 2010